Searching the Web with Server-side Filtering of Irrelevant Information
نویسندگان
چکیده
Even experienced users of IR systems experience a high degree of frustration in searching for information on the World Wide Web, in part because current search engines concentrate on speed and coverage at the expense of precision. In this paper, we describe an approach to increase precision of retrieval based on ltering out irrelevant material. Potentially relevant matches got from a standard Web search engine are ltered using, for example, augmented patterns derived from syntactic structure inherent in natural language text. We argue that the performance of these and other methods of ltering for IR can be improved by the notion of server side scripting, a concept which has not been exploited yet. We describe an implementation of such a system, and discuss issues that arise out this model of improving IR. We conclude with a discussion of areas where this mode of ltering is most appropriate. Abstract Even experienced users of IR systems experience a high degree of frustration in searching for information on the World Wide Web, in part because current search engines concentrate on speed and coverage at the expense of precision. In this paper, we describe an approach to increase precision of retrieval based on ltering out irrelevant material. Potentially relevant matches got from a standard Web search engine are ltered using, for example, augmented patterns derived from syntactic structure inherent in natural language text. We argue that the performance of these and other methods of ltering for IR can be improved by the notion of server side scripting, a concept which has not been exploited yet. We describe an implementation of such a system, and discuss issues that arise out this model of improving IR. We conclude with a discussion of areas where this mode of ltering is most appropriate.
منابع مشابه
کاربرد هستی شناسی های وب معنایی در نظام های اطلاع رسانی پزشکی
One of the challenges of current medical information systems which is based on keyword searching, is that it may retrieve a large amount of irrelevant information during searching. Also, these systems don't provide interoperability among healthcare systems. For interfacing these challenges, and for the purposes of more interoperability between user and machine, semantic web (web 3) has been d...
متن کاملبهینهسازی اجرا و پاسخ صفحات وب در فضای ابری با روشهای پیشپردازش، مطالعه موردی سامانههای وارنیش و انجینکس
The response speed of Web pages is one of the necessities of information technology. In recent years, renowned companies such as Google and computer scientists focused on speeding up the web. Achievements such as Google Pagespeed, Nginx and varnish are the result of these researches. In Customer to Customer(C2C) business systems, such as chat systems, and in Business to Customer(B2C) systems, s...
متن کاملDesigning a Volunteer Geographic Information-based service for rapid earth quake damages estimation
Designing a Volunteer Geographic Information-based service for rapid earth quake damages estimation Introduction The advent of Web 2.0 enables the users to interact and prepare free unlimited real time data. This advantage leads us to exploit Volunteer Geographic Information (VGI) for real time crisis management. Traditional estimation methods for earthquake damages are expensive and tim...
متن کاملA Tutorial on Information Filtering Concepts and Methods for Bio-medical Searching
Vast amounts of information are now widely accessible on the web. Customarily, when a user wants to find interesting documents or date sources, the user has to actively search the World Wide Web. Searchers required effective means to efficiently find the information that they really need, and avoid the irrelevant information that does not match their interests. Information retrieval [1,2], and ...
متن کاملReducing Network Traffic and Managing Volatile Web Contents Using Migrating Crawlers with Table of Variable Information
As the size of the web continues to grow, searching it for useful information has become increasingly difficult. Also study reports that sufficient of current internet traffic and bandwidth consumption are due to the web crawlers that retrieve pages for indexing by the different search engines. Moreover, due to the dynamic nature of the web, it becomes very difficult for a search engine to prov...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997